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Exploiting  the  latent  structure  in  many  real-world  signals  can  dramatically  increase  algorithmic 
robustness  to  both  noise  and  missing  data.  The  theory  of  compressed  sensing  shows  that  if  a  signal 
of  interest  is  sparse  —  well-approximated  by  some  small  subset  of  a  dictionary  of  basis  elements  — 
then  the  signal  can  be  acquired  from  a  reduced  number  of  measurements  and  reconstructed  using 
efficient  convex  programming  techniques.  However,  the  standard  compressed  sensing  theory  is 
valid  only  for  a  restrictive  set  of  dictionaries,  limiting  the  scope  of  applications. 

During  the  tenure  of  this  award,  as  anticipated,  the  PI  developed  a  range  of  reliable  and 
structure-aware  sampling  theorems  based  on  the  weighted  sparsity  model  for  real-world  systems 
which  are  governed  mostly  by  low-order  interactions.  The  weighted  sparsity  model  and  weighted 
sampling  allows  for  more  freedom  than  linear  regression  but  provides  sufficient  structure  to  extend 
compressed  sensing  results  to  a  wide  class  of  infinite-dimensional  problems.  We  discuss  four  key 
findings  arising  from  this  project,  as  related  to  uncertainty  quantification,  image  processing,  matrix 
completion,  and  stochastic  optimization,  respectively. 

In  paper  (PI),  we  consider  the  problem  of  function  interpolation,  and  provide  theoretical  ba¬ 
sics  for  weighted  sparse  approximation.  We  provided  weighted  stochastic  sampling  strategies  for 
interpolating  sparse  or  compressible  expansions  in  orthogonal  polynomial  bases  from  a  minimal 
number  of  pointwise  function  evaluations.  Based  on  a  model  of  weighted  sparsity  which  we  in¬ 
troduced,  we  provide  error  rates  and  choices  of  weights  for  regularization  via  weighted  LI  min¬ 
imization.  We  later  extended  this  work  to  overcomplete  dictionaries  (P2)  and  refined  the  sample 
complexity  analysis  for  Gaussian  measurements  in  (P3).  Our  work  has  found  interest  and  applica¬ 
tion  in  uncertainty  quantification,  namely  the  polynomial  chaos  approach,  where  one  approximates 
the  dependence  of  simulation  model  output  on  model  parameters  by  expansion  in  an  orthogonal 
polynomial  basis. 

In  paper  (P6)  we  considered  the  application  of  weighted  sampling  to  medical  imaging  where 
one  seeks  to  recover  a  good  approximation  of  an  images  with  sparsity  in  terms  of  its  spatial  fi¬ 
nite  differences  and  wavelet  transform  coefficients  from  a  subset  of  measurements  in  the  Fourier 
domain.  We  formulated  the  notion  of  local  coherence  in  the  discrete  setting  and,  by  bounding 
the  inner  product  between  Fourier  and  Haar  wavelet  basis  elements  in  a  certain  way,  provided 
near-optimal  reconstruction  guarantees  with  sampling  frequencies  from  a  fixed  distribution  where 
a  frequency  component  is  sampled  with  probability  proportional  to  its  squared  magnitude  and  re¬ 
covering  an  image  via  total  variation  minimization  from  such  samples. 

Matrix  completion  refers  to  the  problem  of  recovering  a  low-rank  matrix  from  a  small  subset 
of  its  elements,  and  we  also  applied  the  concept  of  weighted  sampling  to  successfully  extend  the 
state-of-the-art  results  for  matrix  completion  in  papers  (P4,  P5).  Matrix  completion  was  previ¬ 
ously  known  to  be  possible  when  the  matrix  satisfies  a  restrictive  structural  constraint — known  as 
incoherence — on  its  row  and  column  spaces.  In  these  cases,  the  subset  of  elements  is  sampled 
uniformly  at  random.  We  showed  that  any  rank-r  matrix  can  be  exactly  recovered  from  as  few 
as  order  0(n  r)  randomly  chosen  elements,  provided  this  random  choice  is  made  according  to  a 
specific  biased  distribution:  the  probability  of  any  element  being  sampled  should  be  proportional 
to  the  sum  of  the  leverage  scores  of  the  corresponding  row  and  column. 
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Finally,  we  applied  weighted  sampling  theorems  to  a  seemingly  very  different  application  in 
large-scale  machine  learning:  stochastic  gradient  descent.  SGD  is  an  iterative  procedure  for  min¬ 
imizing  a  high-dimensional  function  whereby  at  each  step,  one  chooses  an  index  and  descends 
along  the  direction  of  the  gradient  of  a  constituent  function,  repeated  until  convergence  to  within  a 
prescribed  tolerance.  In  huge-scale  optimization  problems,  stochastic  gradient  descent  is  an  effec¬ 
tive  surrogate  for  full  gradient  descent,  which  is  too  expensive.  The  default  sampling  strategy  in 
stochastic  gradient  methods  is  to  sample  component  directions  for  descent  uniformly  at  random. 
In  reference  (P7),  we  showed  that  re-weighting  the  sampling  distribution  so  that  components  with 
larger  variation  are  sampled  with  higher  probability  is  necessary  in  order  to  improve  convergence 
over  uniform  sampling,  and  obtain  a  linear  dependence  on  average,  as  opposed  to  worst-case, 
smoothness  among  the  constituent  functions.  Our  results  are  based  on  a  connection  between  SGD 
and  the  randomized  Kaczmarz  algorithm,  which  had  until  this  point  been  studied  essentially  inde¬ 
pendently  from  SGD,  allowed  us  to  transfer  ideas  between  the  separate  bodies  of  literature  studying 
each  of  the  two  methods. 
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In  paper  (PI ),  we  consider  the  problem  of  function  interpolation,  and  provide  theoretical  basics  for 
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sparse  or  compressible  expansions  in  orthogonal  polynomial  bases  from  a  minimal  number  of  pointwise 
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overcomplete  dictionaries  (P2)  and  refined  the  sample  complexity  analysis  for  Gaussian  measurements  in 
(P3).  Our  work  has  found  interest  and  application  in  uncertainty  quantification,  namely  the  polynomial 
chaos  approach,  where  one  approximates  the  dependence  of  simulation  model  output  on  model 
parameters  by  expansion  in  an  orthogonal  polynomial  basis. 

In  paper  (P6)  we  considered  the  application  of  weighted  sampling  to  medical  imaging  where  one  seeks  to 
recover  a  good  approximation  of  an  images  with  sparsity  in  terms  of  its  spatial  finite  differences  and 
wavelet  transform  coefficients  from  a  subset  of  measurements  in  the  Fourier  domain.  We  formulated  the 
notion  of  local  coherence  in  the  discrete  setting  and,  by  bounding  the  inner  product  between  Fourier  and 
Haar  wavelet  basis  elements  in  a  certain  way,  provided  near-optimal  reconstruction  guarantees  with 
sampling  frequencies  from  a  fixed  distribution  where  a  frequency  component  is  sampled  with  probability 
proportional  to  its  squared  magnitude  and  recovering  an  image  via  total  variation  minimization  from  such 
samples. 

Matrix  completion  refers  to  the  problem  of  recovering  a  low-rank  matrix  from  a  small  subset  of  its  elements, 
and  we  also  applied  the  concept  of  weighted  sampling  to  successfully  extend  the  state-of-the-art  results  for 
matrix  completion  in  papers  (P4,  P5).  Matrix  completion  was  previously  known  to  be  possible  when  the 
matrix  satisfies  a  restrictive  structural  constraint— known  as  incoherence— on  its  row  and  column  spaces.  In 
these  cases,  the  subset  of  elements  is  sampled  uniformly  at  random.  We  showed  that  any  rank-r  matrix  can 
be  exactly  recovered  from  as  few  as  order  0(n  r)  randomly  chosen  elements,  provided  this  random  choice 
is  made  according  to  a  specific  biased  distribution:  the  probability  of  any  element  being  sampled  should  be 
proportional  to  the  sum  of  the  leverage  scores  of  the  corresponding  row  and  column. 

Finally,  we  applied  weighted  sampling  theorems  to  a  seemingly  very  different  application  in  large-scale 
machine  learning:  stochastic  gradient  descent(SGD).  SGD  is  an  iterative  procedure  for  minimizing  a  high¬ 
dimensional  function  whereby  at  each  step,  one  chooses  an  index  and  descends  along  the  direction  of  the 
gradient  of  a  constituent  function,  repeated  until  convergence  to  within  a  prescribed  tolerance.  In  huge- 
scale  optimization  problems,  stochastic  gradient  descent  is  an  effective  surrogate  for  full  gradient  descent, 
which  is  too  expensive.  The  default  sampling  strategy  in  stochastic  gradient  methods  is  to  sample 
component  directions  for  descent  uniformly  at  random.  In  reference  (P7),  we  showed  that  re-weighting  the 
sampling  distribution  so  that  components  with  larger  variation  are  sampled  with  higher  probability  is 
necessary  in  order  to  improve  convergence  over  uniform  sampling,  and  obtain  a  linear  dependence  on 
average,  as  opposed  to  worst-case,  smoothness  among  the  constituent  functions.  Our  results  are  based  on 
a  connection  between  SGD  and  the  randomized  Kaczmarz  algorithm,  which  had  until  this  point  been 
studied  essentially  independently  from  SGD,  allowed  us  to  transfer  ideas  between  the  separate  bodies  of 
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